IN LIST: reinterpret FixedSizeBinary for primitive fast paths#23018
IN LIST: reinterpret FixedSizeBinary for primitive fast paths#23018geoffreyclaude wants to merge 5 commits into
Conversation
70c420f to
098e0a6
Compare
|
run benchmark in_list_strategy |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/in_list_fixed_size_binary_filter (098e0a6) to c7e9284 (merge-base) diff using: in_list_strategy File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagein_list_strategy — base (merge-base)
in_list_strategy — branch
File an issue against this benchmark runner |
|
run benchmark in_list |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf/in_list_fixed_size_binary_filter (098e0a6) to c7e9284 (merge-base) diff using: in_list File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagein_list — base (merge-base)
in_list — branch
File an issue against this benchmark runner |
098e0a6 to
ce699c4
Compare
2e036e0 to
e49202b
Compare
ade2587 to
397944d
Compare
Build Int8 and Int16 IN-list bitmap filters by reinterpreting the input buffers as UInt8 or UInt16 with the same byte width. This avoids copying or numeric conversion while preserving signed integer equality semantics.
Adds a const-generic unrolled comparison chain that avoids CPU branching. Outperforms hash lookups for very small lists. Triggers for primitives when list size <= 32 (4-byte), 16 (8-byte), or 4 (16-byte).
397944d to
0535e9a
Compare
Implements a fast hash table using open addressing with linear probing and a 25% load factor. Replaces the legacy HashSet for primitives, reducing indirection. Triggers for primitives when list size exceeds branchless thresholds.
Introduces a two-stage filter for ByteView types. Stage 1 uses a fast DirectProbeFilter on masked views (len + prefix) for quick rejection; Stage 2 performs full verification only for potential long-string matches. Triggers for Utf8View and BinaryView.
FixedSizeBinary(N) arrays share the same contiguous buffer layout as primitive arrays, so for power-of-2 widths (1, 2, 4, 8, 16) we can zero-copy reinterpret them and use the optimized primitive filters (bitmap, branchless, hash) instead of falling through to the NestedTypeFilter fallback.
0535e9a to
0b737f4
Compare
Which issue does this PR close?
INperformance with specialized implementations #19390.Rationale for this change
FixedSizeBinarymeans every value has the same number of bytes. For widths 1, 2, 4, 8, and 16, those bytes have the same shape as the primitive values optimized earlier in the stack.That lets DataFusion reuse the existing fast paths without copying the bytes:
For example, a
FixedSizeBinary(4)value is four bytes wide, just like aUInt32. The bytes can be checked by the same fixed-width lookup machinery. The value is still treated as binary data; this is only an internal lookup representation.Other fixed-size binary widths stay on the generic fallback path.
What changes are included in this PR?
FixedSizeBinary(1)andFixedSizeBinary(2)through the bitmap filters.FixedSizeBinary(4),(8), and(16)through branchless or direct-probe filters based on list size.FixedSizeBinaryneedles.FixedSizeBinarywidths onArrayStaticFilter.Are these changes tested?
Yes.
cargo fmt --all --checkcargo test -p datafusion-physical-expr fixed_size_binary --libcargo test -p datafusion-physical-expr test_in_list_from_array_type_combinations --libcargo test -p datafusion-physical-expr reinterpreted_ --libcargo test -p datafusion-physical-expr in_list_binary_types --libcargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warningsAre there any user-facing changes?
No. This is an internal performance optimization only.
Local benchmark snapshot
Benchmark command:
Method: compare adjacent saved baselines using raw Criterion sample minima (
min(time / iters)). Lower is better; changes within +/-5% are treated as noise.Compared baselines: #23016 -> #23018
Relevant scope: FixedSizeBinary rows.
Summary: 8 relevant rows, 8 faster, 0 slower, 0 within +/-5%.
fixed_size_binary/fsb16/list=10000/match=0%fixed_size_binary/fsb16/list=10000/match=50%fixed_size_binary/fsb16/list=256/match=0%fixed_size_binary/fsb16/list=256/match=50%fixed_size_binary/fsb16/list=4/match=0%fixed_size_binary/fsb16/list=4/match=50%fixed_size_binary/fsb16/list=64/match=0%fixed_size_binary/fsb16/list=64/match=50%